Constrained Maximum Mutual Information Dimensionality Reduction for Language Identification

نویسندگان

  • Shuai Huang
  • Glen A. Coppersmith
  • Damianos Karakos
چکیده

In this paper we propose Constrained Maximum Mutual Information dimensionality reduction (CMMI), an informationtheoretic based dimensionality reduction technique. CMMI tries to maximize the mutual information between the class labels and the projected (lower dimensional) features, optimized via gradient ascent. Supervised and semi-supervised CMMI are introduced and compared with a state of the art dimensionality reduction technique (Minimum/Maximum Rényi’s Mutual Information using the Stochastic Information Gradient; MRMISIG) for a language identification (LID) task using CallFriend corpus, with favorable results. CMMI also deals with higher dimensional data more gracefully than MRMI-SIG, permitting application to datasets for which MRMI-SIG is computationally prohibitive.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimension reduction for speaker identification based on mutual information

Dimension reduction is a necessary step for speech feature extraction in a speaker identification system. Discrete Cosine Transform (DCT) or Principal Component Analysis (PCA) is widely used for dimension reduction. By choosing basis vectors from basis vector pool of DCT or PCA which contribute more to data distribution variance or reconstruction accuracy of speech data set, we can transform th...

متن کامل

Second Order Dimensionality Reduction Using Minimum and Maximum Mutual Information Models

Conventional methods used to characterize multidimensional neural feature selectivity, such as spike-triggered covariance (STC) or maximally informative dimensions (MID), are limited to Gaussian stimuli or are only able to identify a small number of features due to the curse of dimensionality. To overcome these issues, we propose two new dimensionality reduction methods that use minimum and max...

متن کامل

Simple and Effective Dimensionality Reduction for Word Embeddings

Word embeddings have become the basic building blocks for several natural language processing and information retrieval tasks. Recently, there has been an emphasis on further improving the pre-trained word vectors through post-processing algorithms. One such area of improvement is the dimensionality reduction of word embeddings. Reducing the size of word embeddings through dimensionality reduct...

متن کامل

Dimensionality reduction based on non-parametric mutual information

In this paper we introduce a supervised linear dimensionality reduction algorithm which finds a projected input space that maximizes the mutual information between input and output values. The algorithm utilizes the recently introduced MeanNN estimator for differential entropy. We show that the estimator is an appropriate tool for the dimensionality reduction task. Next we provide a nonlinear r...

متن کامل

Maximum conditional mutual information projection for speech recognition

Linear discriminant analysis (LDA) in its original modelfree formulation is best suited to classification problems with equal-covariance classes. Heteroscedastic discriminant analysis (HDA) removes this equal covariance constraint, and therefore is more suitable for automatic speech recognition (ASR) systems. However, maximizing HDA objective function does not correspond directly to minimizing ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012